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^ I Abstract 
h— ( 

^ It is well known that €i minimization can be used to recover sufBciently sparse unknown signals 

1 ^ I from compressed linear measurements. In fact, exact thresholds on the sparsity, as a function of 

the ratio between the system dimensions, so that with high probability almost all sparse signals can 
^ be recovered from i.i.d. Gaussian measurements, have been computed and are referred to as "weak 

\0 r-t ■ ■ ■ 

thresholds" [I]. In this paper, we introduce a reweighted £i recovery algorithm composed of two steps: 
a standard tx minimization step to identify a set of entries where the signal is likely to reside, and 
• a weighted t\ minimization step where entries outside this set are penalized. For signals where the 

T-H non-sparse component entries are independent and identically drawn from certain classes of distri- 

^ ] butions, (including most well known continuous distributions), we prove a strict improvement in the 

^ weak recovery threshold. Our analysis suggests that the level of improvement in the weak threshold 

depends on the behavior of the distribution at the origin. Numerical simulations verify the distribution 
H dependence of the threshold improvement very well, and suggest that in the case of i.i.d. Gaussian 

nonzero entries, the improvement can be quite impressive — over 20% in the example we consider. 



1 Introduction 

Compressed sensing addresses the problem of recovering sparse signals from under-determined systems of 

linear equations [2]. In particular, if x is an n x 1 real vector which is known to have at most k nonzero 

elements where k < n, and A is an m x n measurement matrix with k < m < n, then for appropriate 
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values of /c, m and n, it is possible to efficiently recover x from the set of linear projections y = Ax 
[31 m El [^. The most well recognized such algorithm is ii minimization which can be formulated as 
follows: 



The first result that established the fundamental thresholds of signal recovery using £i minimization is 
due to Donoho and Tanner [H H], where it is shown that if the measurement matrix is i.i.d. Gaussian, 
for a given ratio oi 5 = ^, £i minimization can successfully recover every A;-sparse signal, provided 
that = ^ is smaller than a certain threshold. This statement is true asymptotically as n — t- cxd and 
with high probability. This threshold guarantees the recovery of all sufficiently sparse signals and is 
therefore referred to as a strong threshold. It therefore does not depend on the actual distribution of the 
nonzero entries of the sparse signal and as such is a universal result. However, at this point, it is not 
known whether there exist other polynomial-time algorithms with strong thresholds superior to those of 
ii minimization. 

Another notion introduced and computed in [H 0] is that of a weak threshold where signal recovery is 
guaranteed for almost all support sets and almost all sign patterns of the sparse signal, with high proba- 
bility as n — )• oo. The weak threshold is the one that can be observed in simulations of ii minimization 
and allows for signal recovery beyond the strong threshold. The weak threshold of ii minimization is 
also universal from the vantage point of signal distribution; The amplitudes of the nonzero entries of a 
sparse signal does not affect its recoverability by solving ([T]). In other words, if a sparse signal with a 
support set S and a particular sign pattern is recoverable using ii minimization, so is every other signal 
with the same support and sign pattern. It is worth noting that the weak thresholds of ii minimization 
can be generalized to a broader class of random measurement matrices, including those with null spaces 
that are random orthant symmetric and generic subspaces (e.g., matrices with i.i.d. Bernoulli or uniform 
(-1,1) entries, etc.) [7]. Finally, similar to the strong thresholds, it is not known whether there exist other 
polynomial-time algorithms with superior weak thresholds than ii minimization. 

Our Contributions. In this paper we prove that a certain two-step reweighted ii algorithm indeed has 
higher weak recovery guarantees than ordinary ii minimization for particular classes of sparse signals, 
including sparse Gaussian signals. We had previously introduced this algorithm in [8j, and had proven that 
for a very restricted class of polynomially decaying sparse signals it outperforms standard ii minimization. 
In this paper however, we extend this result to a much wider and more reasonable class of sparse signals. 
The key to our result is the fact that for these classes of signals, ii minimization has an approximate 
support recovery property which can be exploited in reweighted £i algorithm, to obtain a provably superior 
weak threshold. In particular, we consider Gaussian sparse signals, namely sparse signals in which the 
nonzero entries are i.i.d. Gaussian. Our analysis of Gaussian sparse signals relies on concentration bounds 
on the partial sum of their order statistics. Furthermore, we show that for continuous distributions with 
sufficiently fast decaying tails and nonzero value at the origin, similar improvements for the weak threshold 
can be postulated. More generally, we show that as long as the nonzero entries of the sparse signal are 
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independently drawn from a continuous distribution /(•) that has a nonzero finite order derivative at the 
origin, the weak recovery threshold of our proposed two step reweighted ii algorithm is strictly larger than 
that of ii minimization. Although not specifically derived, our analysis suggests that the improvement 
rate is a function of the smallest integer r for which f^^\0) ^ 0; The smaller such r is, the larger the 
improvement is. We perform numerical simulations using various distributions which authenticate this 
assertion. 

It is worth noting that different variations of reweighted £i algorithms have been recently introduced 
in the literature and, have shown experimental improvement over ordinary ii minimization [9l[T0]. In 
[9] approximately sparse signals have been considered, where perfect recovery is often not achieved. The 
question is therefore not that of an explicit recovery threshold extension. Instead, it has been shown that 
the reconstruction error can be reduced using an iterative scheme. In |10| . a similar algorithm is suggested 
and is empirically shown to outperform ii minimization for exactly sparse signals with certain continuous 
distributions. In particular, it was empirically witnessed that the proposed algorithm does not improve 
the signal recovery for sparse vectors with constant amplitude nonzero entries (i.e. a nonzero entry is 
either 1 or -1). Unfortunately, [10] provides no theoretical analysis or performance guarantees for the 
success or failure of the method. The particular reweighted £i minimization algorithm that we propose 
and analyze is of significantly less computational complexity than the earlier ones (it only solves two 
linear programs). Furthermore, experimental results confirm that it exhibits much better performance 
than previous reweighted methods. Finally, while we do rigorously establish a strict improvement in the 
weak threshold, we currently do not have tight bounds on the new weak threshold and simulation results 
are far better than the bounds we can provide at this time. 

The organization of this paper is as follows. In Section [2j we introduce the basic definitions used 
throughout the paper. In Section [3j the signal model is described, the notions of strong and weak 
recovery thresholds are quantified and the main problem is stated, namely to find a polynomial time 
recovery algorithm with better thresholds than li minimization for sparse signal recovery. In Section 
|4] a two step reweighted linear programming algorithm is described and is claimed to be superior in 
performance to the regular ii minimization algorithm for sparse vectors with Gaussian distributions 
(Theorem 4.1). Sections [s] and [g] are dedicated to the detailed proof of this claim, through separate 



analysis of different stages of the algorithm. In Section [7| these results are generalized to a much broader 
class of sparsity models beyond Gaussians. The technical discussions of this paper predict that the 
performance of the proposed algorithm strongly depends on the distribution of the nonzero entries of the 
random sparse signal model. The paper ends in Section [8] with some numerical evaluations of the proposed 
algorithm and the verification of the distribution dependent behavior of the reweighted algorithm. 

2 Basic Definitions 

A sparse signal with exactly k nonzero entries is called /c-sparse. For a vector x, ||x||i denotes the ii 
norm. The support (set) of x, denoted by supp{x), is the index set of its nonzero coordinates. For a 
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vector X that is not exactly fc-sparse, we define the /c-support of x to be the index set of the largest k 
entries of x in amplitude, and denote it by suppk{x)- For a subset K of the entries of x, means the 
vector formed by those entries of x indexed in K. Finally, max |x| and min |x| mean the absolute value 
of the maximum and minimum entry of x in magnitude, respectively. 

3 Signal Model and Problem Description 

We consider sparse random signals with i.i.d. nonzero coefficients drawn from a given continuous distribu- 
tion (in particular Gaussian). In other words we assume that the unknown sparse signal is an n x 1 vector 
X with exactly k nonzero entries, where each nonzero entry is independently derived from a distribution 
/(•) {e.g., standard normal distribution M{0, 1)). The measurement matrix A is an m x n matrix with 
i.i.d. Gaussian entries with an aspect ratio 6 = ^. The theory of compressed sensing guarantees that if 
= ^ is smaller than a certain threshold, then for almost all measurement matrices A every /c-sparse 
signal can be recovered using li minimization. The relationship between 5 and the maximum threshold 
of /U for which such a guarantee exists is called the strong sparsity threshold, and is denoted by fJ-si^)- A 
more practical performance guarantee is the so-called weak sparsity threshold, denoted by fiwi^), which 
has the following interpretation: For a fixed value of (5 = ™ and an i.i.d. Gaussian matrix A of size 
m X n, a random /c-sparse vector x of size n x 1 with a randomly chosen support set and a random 
sign pattern can be recovered from Ax using ii minimization with high probability, if ^ < fiw{S). In 
addition, other forms of recovery thresholds can be defined using different constraints and requirements. 
For example, when the reconstruction of signals with all support sets and almost all sign patterns is 
considered, the resulting thresholds are called sectional. These thresholds were discussed in [3j for i.i.d. 
Gaussian matrices. Furthermore, strong and weak thresholds can also be defined and evaluated for the 
reconstruction of nonnegative signals (see e.g. or for alternative classes of matrix ensembles. For 

example, strong thresholds for li minimization over expander- graph-based measurement matrices were 
derived in |12j . and in [T3j for nonegative vectors in addition to weak threshold forms. 

In this paper, we consider sparse signals that fall outside the recoverability regime of ii minimization. 
In other words, we assume that the support size of x, namely k, is slightly larger than the weak threshold 
of ii minimization. In other words. A: = (1 + eo)^w('^)^ for some eo > 0. This means that if we use 
li minimization, a randomly chosen fiw {S)n-sparse signal will be recovered perfectly with very high 
probability, whereas a randomly selected /c-sparse signal will not. We would like to show that for a 
strictly positive eo, the two-step reweighted £i algorithm of Section |4] can indeed recover a randomly 
selected A;-sparse signal with high probability, implying that the proposed method has a superior weak 
threshold. 
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Fi gure 1: A pictorial example of a sparse signal and its ii minimization approximation. 

Algorithm 1 Two Step Reweighted £i minimization. 

1: Input: Measurement matrix A"*^", measurement vector y™"^^, integer k < n, predetermined real 

valued weight a; > 1. 
2: Output: Sparse vector x with Ax = y. 
3: Solve the li minimization problem: 

X = argmin ||z||i subject to Az = y. (2) 

4: Obtain an approximation for the support set of x: find the index set L C {1,2, ...,n} which corre- 
sponds to the largest k elements of x in magnitude. 
5: Solve the following weighted ii minimization problem and declare the solution as output: 

X* = argmin ||zl||i + (^||zj;||i subject to Az = y. (3) 



4 Two-Step Weighted £i Algorithm 

We propose the following method outlined in Algorithm [l| consisting of two linear programming steps: 
a standard ii minimization and a weighted one. The input to the algorithm is the vector y = Ax, where 
X is the unknown A;-sparse signal with A; = (1 + eo)/iw('^)^) and the output is an approximation x* to 
the unknown vector x. We assume that the sparsity k (or an upper bound on it) is known. However, the 
algorithm assumes no knowledge of the distribution of the nonzero entries of the unknown signal. Also 
cj > 1 is a predetermined weight. 

The intuition behind the algorithm is as follows. In the first step, a standard ii minimization is 
performed. If the sparsity of the signal is beyond the weak threshold ;Uvy((5)n, then ii minimization is 
most probably not capable of recovering the signal. However, we use the output of the ii minimization 
to identify an index set, L, which we "hope" contains most of the nonzero entries of x (see Figure 
[T|. We finally perform a weighted £i minimization by penalizing those entries of x that are not in L 
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(ostensibly because they have a lower chance of being nonzero) . Consequently, Algorithm [T] is capable of 
recovering less sparse signals, or equivalently has a higher weak threshold than that of ii minimization. 
This intuition is formalized in the following theorem. 

Theorem 4.1 (Weak threshold of Algorithm [T]) . Let A be an m x n i.i.d. Gaussian matrix with ^ = 6. 
There exist eg > and uj > so that Algorithm^ perfectly recovers a random (1 + eo)fiw{^)n'-sparse 
vector with i.i.d. Gaussian entries with high probability as n grows to infinity. 

The interpretation of the above theorem is that for sparse signals whose nonzero entries follow a 
Gaussian distribution, Algorithm [T] has a recovery threshold beyond that of standard £i minimization. 
The proof is provided in the next sections as follows. In Section [Sj we prove that there is a large overlap 
between the index set L, found in step 2 of the algorithm, and the support set of the unknown signal 
X (denoted by K) — see Theorem 5.1 and Figure [!} Then in Section [6| we show that the large overlap 
between K and L can result in perfect recovery of x, beyond the standard weak threshold, when a 
weighted ii minimization is used in step 3. The formal proof of Theorem 4.1 appears in Section [61 



5 Approximate Support Recovery, Steps 1 and 2 of the Algorithm 

In this section, we carefully study the first two steps of Algorithm [T] The unknown signal x is assumed to 
be a Gaussian fc-sparse vector with support set K, where k = \K\ = (1 + eo)/iH^((J)n, for some eo > 0. By 
a Gaussian /c-sparse vector, we mean one where the nonzero entries are i.i.d. Gaussian (zero mean and 
unit variance, say). It should be noted that the Gaussian distribution is only considered as a standard 
choice. We later extend our analysis to other signal distributions. The solution x to the £i minimization 
obtained in step 1 of Algorithm [T] is in all likelihood a dense vector. The set L, as defined in the algorithm, 
is the /c-support set of x (i.e. L = snppfc(x)). We show that for small enough eo, the intersection of L and 
K is with high probability very large, so that L can be counted as a good approximation to K (Figure 

0- 

In order to find a decent lower bound on |L n Xl, we point out three separate facts and establish a 
connection between them. First, we prove a general lemma that provides a lower bound on the quantity 
l-L n K\ as a function of ||x — x||i. Then, we discuss a critical property of ii minimization known as weak 
robustness which helps provide an upper bound on the quantity ||x — x||i. The robustness result is due 
to Xu et al. and was first proved in |14j . However, we provide explicit scaling laws for the robustness of 
ii minimization beyond the implicit results of [I4J. Finally, we leverage some concentration results for 
order statistics to derive explicit formulae for the obtained bounds. These steps will be elaborated in the 
remainder of this section. 

Definition 1. For a k-sparse signal x, we define W{x, A) to be the size of the largest subset of nonzero 
entries of x that has a li norm less than or equal to X, i.e., 

W{'K,X) = maxllS*! | S C supp{x.), Hx^Hi < A}. 
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Note that VF(x, A) is increasing in A. 

Lemma 5.1. Let x be a k-sparse vector and x be another vector. Also, let K be the support set ofx and 
L be the k-support set of Sc. Then 

|i^nL| > A;- PF(x, ||x-i||i). (4) 



Proof. Let Xj be the ith entry of x and e* = (ei, 62, • • ■ , e^)^ be the solution to the foUowing minimization 
problem: 

minimize ||e||i 

s.t.max|(x + e)/^\j;^| < min|(x + e)i|, (5) 

where K\L denotes the subset of the entries of K that are not in L. Note that the vector x — x satisfies 
the constraint of the minimization problem ([s]). This is because x + (x — x) = x and L is the fc-support 
of X. Therefore every entry of x outside the set L is smaller in amplitude than every entry inside L. 
Therefore since e* is the optimal solution of ([5]) we must have: 

||e*||i < ||x — x||i. (6) 

Let a = max |(x + e*)x\L|- Then for each i £ K \L, using the triangular inequality we have 

— \ei\ < \xi + ei\ < a, \/i £ K \ L, (7) 

and so: 

\ei\ > max(|xi| — a, 0), \/i £ K \ L. (8) 
Therefore, by summing up the inequalities in ([s]) for i £ K \ L we have 

iie:;,\^iii> Yl (9) 

ieK\L,\xi\>a 

On the other hand, for all i £ L\K, we have |ej| > a, and therefore: 

\\el^K\\i>a\L\K\. (10) 
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But |L \ i^r| = \ L| and hence it follows that 



|e*||i > lle^^^lli + lle^^^^lli 

> a\K\L\+ i\xi\-a) 

ieK\L,\xi\>a 

> 1^*1 = \\^k\l\\i- (11) 

i(£K\L 



(|6| and (11) together imply that ||x — x||i > ||x;^\^||i, which by definition means that VF(x, ||x — x||i) > 
\K\L\. U 

We now introduce the notion of weak robustness, which allows us to bound ||x — x||i, and has the 
following formal definition [14J. 

Definition 2. Let the set S C {1, 2, • • • , n} and the subvector x^ be fixed. An approximation x to x is 
called weakly robust with respect to the set S if, for some Cs > 1, it holds that 

||(x-x)5||i<^^||x^||i, (12) 

and 

2 

llxfi-ll - llxsll < r||x5r||l. (13) 

Cs - 1 

Cs is called the robustness parameter of the considered approximation for the set S. 

The weak robustness notion allows us to bound the error in ||x — x||i in the following way. If x is a weakly 
robust approximation to x with respect to the set S and parameter Cs > 1, such that Ax = Ax, and 
if the matrix As obtained by retaining only those columns of A that are indexed by S has full column 
rank, then the quantity 

I|w5||l 

K = max 1— , 

Aw=0,W7^0 ||w^||i 

must be finite, and one can conclude that 

This result is due to where in addition it has been shown that for Gaussian i.i.d. measurement 
matrices A, the solution of ii minimization provides a weakly robust approximation with high probability. 
In other words, for a randomly chosen subset S with — < ^w{6), there exists a robustness factor C > 1 



as a function of ^ for which (12) and (13) hold with high probability for an arbitrary vector x, where x 



is the solution obtained by ii minimization. Now let ki = (1 — ei)iJ,w{S)n for some small ei > 0, and Ki 
be the fci-support set of x, namely, the set of the largest ki entries of x in magnitude. Based on equation 



(|14|) we may write 

2 g(ei)(l + At 
C(ei) - 1 



- II ^ ^^y^i \^ -r II II f^ t-N 

X - x||i < , — ^ — llxj^lli, (15) 
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where for a fixed value of 5, we liave emphasized that the constant C for the set Ki is a function of 
ei. Furthermore, C(ei) becomes arbitrarily close to 1 as ei — )■ 0. k is also a bounded function of 
ei and therefore we may replace it with an upper bound k* . This provides a bound on ||x — x||i. To 
explore this inequality and understand its asymptotic behavior, we apply a third result, which is a certain 
concentration bound on the order statistics of Gaussian random variables. 

Lemma 5.2. Suppose Xi,X2, ■ ■ ■ ,Xn are N i.i.d. J\f{0, 1) random variables. Let Sn = Yld=i ^^'^ 
let Sm be the sum of the largest M numbers among the \Xi\ 's, for each 1 < M < N. Then for every 
e > sufficiently small, as N ^ oo, if the ratio M/N is kept constant, we have 



Sn 



Sm 
Sn 



exp( 



> e ^ 0, 



(16) 
(17) 



where ^'(x) = Q ^{x) with Q{x) = e "2 dy. 

To make the proof more understandable and the paper more readable, we mention the general idea 
of the proof of the above lemma very coarsely here. The detailed proof is outlined in Appendix [A} For a 
particular instance of Xi, . . . ,Xjv, if < a < 1 is such that exactly a fraction M/N of l-'^il's are larger 
than a, then every which is larger than a contributes to the sum Sm- Therefore Sm can be thought 
of as those |Xj|'s that are larger than a. This can be expressed in another way. Let Xi be a random 
variable which is equal to \Xi\ \i \Xi\ > a and is otherwise. We therefore conclude that Sm is equal 
to the sum of X]r=i^«- Furthermore, when is large, it can be shown using concentration lemmas 
that a will be arbitrarily close to the fixed number ^(5^), and thus the distributions of A,'s converge 
to the same distribution, namely the truncated absolute value of a normal distribution. Besides, when 
a is constant Aj's are independent and therefore one can apply the law of large numbers to conclude 
that Sm/Sn ~ IEAi/E|Ai|, which is the desired conclusion. These arguments are rigorously outlined in 
Appendix \K\ 

Recall that we assumed that x is a A;-sparse random Gaussian signal with k = [1 + eo)/^iy ('^)^i and 
we defined Ki to be the /ci-support of x, where ki = {1 — ei)^vF(^)"- We denoted by K the support set 
of X. Also, if x is the approximation to x obtained by li minimization, we denoted by L the /c-support 
set of X. As a direct consequence of Lemma 5.2 we can write: 



X- 



111 



1 



(1 



-0.5*2(0.5i^). 



for e > sufficiently small as n — )• (X). Define 

C(eo) 



> e 



0, 



(18) 



- inf ?^3^ilii±^(l-e-°-'*'(°-'^)). 



ei>o C{ei) - 1 



(19) 



9 



C(eo) < e W 1, (20) 



Incorporating (15) into (18) we may write 

|x — x||i 
l|x||i 

for e > sufficiently small as n — )• oo. 

Let us summarize our conclusions so far. First, we were able to show that |-fCnL| > k — W{x, ||x— x||i). 
The weak robustness of ii minimization and the Gaussianity of the signal then led us to the fact that for 
large n with high probability ||x — x||i < C(eo)||x||i. These results build up the next key theorem, which 
is the conclusion of this section. 

Theorem 5.1 (Approximate Support Recovery). Let A be an i.i.d. Gaussian mxn measurement matrix 
with ^ = 5. Let k = (1 + eo)^vy(^) (^^d x be an n x 1 random Gaussian k-sparse signal. Suppose that x 
is the approximation to x given by l\ minimization, i.e. x = af'5w.i?^Az=Ax||z||i. Then, as n ^ oo, for 
all e> 0, 

P _ 2Q(V-21og(l-C(.o))) > ^ 1, (21) 



where ^(•) is defined in (19). 



Before proving the above theorem, we mention the following useful lemma, the proof of which will be 
given in Appendix [Bj 

Lemma 5.3. Let x be a random k-sparse Gaussian vector of size n, and < a < 1. For any positive e, 
the following happens with high probability as n,k ^ oo: 

^^'''^"''"'^ < (1 - 2Q(v/-21og(l-a))) + e. (22) 



Proof of Theorem \5.1\ From equation (32), for every e' > and large enough n, with high probability 
we have ||x — x||i < (Ci^o) + Therefore, from Lemma 



5.1 



creasing in X, \K n L\ > k - VF(x, (C(eo) + e'] 
e')) with the upper bound given by Lemma 



X 



5.3 



and the fact that Ty(x, A) is in- 
) with high probability. Replacing for W{x, (C(eo) + 
it follows that with very high probability ^'^^'^^ > 



2Q(-\/— 2 log(l — C(eo) — e')) ~ ^" ■ We can now let e' go to zero and the proof is completed. 
Note that if lim(;g_s.o Ci^o) = 0, then Theorem 5.1 iTnT^ii-^^ ^'^•^^ \KnL\ 



implies that ^—j^ becomes arbitrarily close to 1, which 
means that using ii minimization it is possible to closely estimate the support set of x. We show in the 
sequel that this is in fact the case. 



5.1 Scaling Law of ii Minimization 

In order to show that the robust approximation of the sparse signal at step 1 of Algorithm [T] leads to 
perfect recovery at step 3, we need to obtain an explicit bound for the term Ci^o)- This in turn requires 
calculating a solid relationship between the robustness parameter C(ei), and the back-off fraction ei. 
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For i.i.d. Gaussian matrices, we derive an explicit lower bound on C(ei) as a function of ei through the 
following theorem, the proof of which appears in Appendix |D] 

Theorem 5.2 (Scaling law of (.i minimization for Gaussians.). Let A be anmx n i.i.d. Gaussian matrix 
with m = 5n, and fiwi^) be the weak recovery threshold of ii minimization for A. For sufficiently large n, 
the (weak) robustness parameter C(ei) for a randomly chosen ki-support Ki of size fci = (1 — ei)/ivy((5)n 



(see equation 15) satisfies: 

C(ei) > -=^. (23) 
Vl - ei 



We now derive an asymptotic upper bound on the term C(eo) using the above relationship. Replacing 



the bound of (23) in the definition of Ci^o), we obtain: 

^ ■ n 2C(ei)(l + K*) / _o.5*2(o.5i^) 



< inf ^il±£Lfl-e-°-^^^(°-^^M (24) 



1 - e ^ i+'=o' 1 , 



where (25) is obtained by simply taking ei = eo, and using the fact that < 2/eo. We use the 



Taylor approximation of the inverse error function to bound the right hand side of (25). Note that: 



^(0.5i^) = V2.er:r^(^) (26) 
1 + eo Vl + eoy 

= V2^-eo + o{el). (27) 

It follows that: 

C(eo) <47r(l + K*)eo + 0(eg), (28) 
As eo — )• 0. Therefore, we can immediately see that lime(,_>o C(eo) = 0. 

6 Perfect Recovery, Step 3 of the Algorithm 

In Section [5] we showed that if eg is small, the fc-support of x, namely L = suppki'x.), has a significant 
overlap with the true support of x. We even found a quantitative lower bound on the size of this overlap 
in Theorem 5.1 In step 3 of Algorithm [!} weighted £i minimization is used, where the entries in L are 



assigned a higher weight than those in L. In [IB], we have been able to analyze the performance of such 
weighted ii minimization algorithms. The idea is that if a sparse vector x can be partitioned into two 
sets L and L, where in one set the fraction of non-zeros is much larger than in the other set, then ^ 
can potentially recover x with an appropriate choice of the weight uj > 1, even though £i minimization 
cannot. The following theorem can be deduced from the computations of |17j . 
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Theorem 6.1. Let L C {1, 2, ■ ■ ■ ,n} , uj > 1 and the fractions /i, /2 € [0, 1] be given. Let 7i = ^ CLnd 
72 = 1 — 71. There exists a threshold Xc{^i,j2, fi, f 2,^) such that with high probability, almost all random 
sparse vectors x with at least /i7in nonzero entries over the set L, and at most f2l2n nonzero entries 
over the set L can be perfectly recovered using niinAz=Ax II^lHi + '^llz^jHi, where A is a XcU x n matrix 
with i.i.d. Gaussian entries. 

For completeness, in Appendix [Ct we provide the calculation of Ac(7i,72, /i, f2,oj), based on the calcula- 
tions of [T7]- A software package for computing such thresholds can also be found in 



Proof of Theorem 4-1 Recall that the solution of li minimization in the first stage of Algorithm ([T]) is 



the vector x. We denoted by L the A;-support set of x, and by L^ its complement set. The last stage of 
the algorithm is a weighted li minimization that puts more weight on the entries of x outside the set L. 
The justification for this is the fact that the fraction of the nonzero entries of the target signal x over the 
set L is supposedly larger than the fraction of the nonzero entries over L'^. Let us denote these fractions 
by /i and /2 respectively, namely /i = ■'^jjp and /2 = ^^^^ , where K is the support of the target signal, 
unknown to the algorithm before running the weighted £i minimization of the last stage. Since we are 
using a weighted ii minimization, x will be recovered perfectly with high probability if the number of 
measurements is large than the threshold of weighted ii minimization for the nonuniform sparsity model 
of the target signal, namely if: 

Ae(-,l--,/i,/2,^)<<5, (29) 
n n 



where Ac was defined in Theorem 6.1 and was characterized in [,17j. On the other hand, through Theorem 
5.1 , we provided a lower bound on /i (and consequently an upper bound on /2) and we showed that as eo — )• 
0, /i converges to 1 (and consequently /2 approaches zero). The asymptotic value of Ac(^, 1 — ^, fi, f2,uj) 
will therefore be equal to Xc{nw{S),l — //vi/((^), 1, 0, w), as e — )• (Recall that = (1 + eQ)fiw{S)n). 
Furthermore, from the computations of [17], it can be shown that Ac(/ivi^(5), 1 — fj-wi^), 1)0, w) < 6 for an 
appropriate choice of cj > 1, and that for a fixed u, the function Ac (71, 72, /i, /2) is a continuous function of 
71, /i and /2. Furthermore, k, the lower bound on fi and the upper bound on /2 obtained from Theorem 



5.1 are all continuous functions of eo in this case. Therefore, we can conclude that for a strictly positive 
eo and corresponding overlap fractions /i and /2, Ac((l + eo)fiw{S), 1 — (1 + eo)nwiS), fi, f2, w) < 6. This 
means that for some strictly positive eo the number of measurements that is required to reconstruct the 
signal precisely in the last stage of the algorithm is less than the number of measurements in A, i.e. x 
will be recovered with high probability, despite the fact that it has more nonzero entries that the weak 
threshold of £1 minimization. This completes the proof. ■ 



7 Generalization to Beyond Gaussians 

The theoretical threshold improvement of the proposed iterative £1 minimization algorithm was demon- 
strated for the case of i.i.d. Gaussian matrices, and sparse vectors with independent Gaussian nonzero 
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entries. It is reasonable to ask if we can extend these results to sparse signals with other distributions. We 
address this problem in this section. In summary, we prove that the theoretical threshold improvement 
can be generalized to sparse signals whose nonzero entries obey a more general class of distributions, 
namely continuous symmetric distributions with a non-vanishing finite order derivative at the origin. 
This is outlined in the following section. 

7.1 Arbitrary Distributions 

The attentive reader will note that the only step where we used the Gaussianity of the signal in the 
proof of threshold improvement was in the the order statistics results of Lemma |5.2[ This result has 



constant magnitude signals (say BPSK), the function behaves as jj-, 



the following interpretation. For i.i.d. random variables, the ratio ^ can be approximated by a 
known function of In the Gaussian case, this function behaves as 1 — (1 — ^)^, as M ^ N. For 

for M ^ N, which predicts that the 
reweighted method yields no improvement. A more careful analysis reveals that the improvement over ii 
minimization depends on the behavior of as M ^ N, which in term depends on the smallest order 
n for which /^"^(O) 7^ 0, i.e., the smallest n such that the n-th derivative of the distribution at the origin 
is nonzero. We formalize these results by generalizing the arguments of the previous section. First, we 



present a generalization of Lemma 5.2 for arbitrary symmetric distributions. 



Lemma 7.1. Suppose X, Xi, X2, ■ ■ ■ ,Xn are N i.i.d. random variables, drawn from a symmetric distri- 
bution /(•). Let Sn = X^i^i \Xi\ and let Sm be the sum of the largest M numbers among \Xi \ 's, for each 
1 < M < N . If f{-) is integrable, and if for every finite a > 0, the integral x^f{x)dx is finite, then 
for every e > sufficiently small, as N ^ 00 and the ratio M/N is kept constant, the following holds 



Sm 
Sn 



(1 



J^''^'"'^x-f{x)dx , 



> € 



(30) 



where ^f{x) = Q^'^{x) with Qf{x) = f{y)dy. 



||x-j^||i 

Using the above lemma, we can modify the concentration term of equation ( 18 ) for the term — 



l|x|| 

where the distribution of the nonzero entries of x is /(•). The resulting concentration thus becomes: 

\ 



V 



mil 



> e 



0, 



(31) 



which, when put together with the bound of (15) results in (Note that the bound in (15) is independent 
from the distribution of x): 

■Ix-^ll, \ (32, 



mil 



C/(eo) < e ^ 1 



for every e > 0. Here Cfi^o) is defined by: 
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.... . . 2C(ei)(l + K*) Jo'^"'^"-°' 

" ">o Cie,) - 1 ^ ^ • ^ ^ 



Consequently, following similar arguments as in the proofs of Theorem 5.1, we can state the follow- 
ing theorem as a generalization of the approximate support recovery of ii minimization for arbitrary 
distributions, the proof of which is immediate. 

Theorem 7.1 (Approximate Support Recovery /Generalization). Let A be an i.i.d. Gaussian m x n 
measurement matrix with ^ = 5. Let A; = (1 + eQ)^\Y{5)n and x be an n x 1 k-sparse signal whose 
nonzero entries are independently drawn from a distribution /(•) which satisfies the conditions of Lemma 



7.1 Suppose that x is the approximation to x given by the ii minimization, i.e. x = argminAz=Ax\\^\\i- 



Then, as n —)• oo, for e > sufficiently small, we have 

P - 2Q,(\/-21og(l.C/(.o))) > - 1, (34) 



where C/(') ^-5 defined in (33). 

Note that Q/(-) is always a decreasing function which is equal to zero at the origin for symmetric 



distributions. Therefore, the overlap fraction given by Theorem 7.1 can be arbitrarily close to 1, provided 
that C/(£o) is sufficiently small. Therefore, the key in further conclusions on the above bound is to derive a 
bound on the term (^j(eo), and show that it becomes arbitrarily small. For BPSK signals for instance, the 
term ||x||i ^ always equal to eo, and therefore we cannot guarantee that C(£o) vanishes asymptotically 



as eo — ?• based on (33). In fact we prove that lime(,_j.o C(£o) = 0, for distributions /(•) for which one of 
the finite order derivatives at the origin is nonzero, stated formally in the following lemma: 

Lemma 7.2. Let /(•) be a symmetric distribution which satisfies the conditions of Lemma \ 1. i[ If for 
some integer r > 0, the r'th order derivative of f{-) at origin exists and does not vanish, i.e., f^^\0) ^ 0, 
then C/(eo) = ^{^o^^^~^^^)j o.^ ~^ 0. Consequently, the support set approximation of £i minimization is 
asymptotically perfect with high probability as eo — >• 0. 

Proof. For simplicity, we take ei in the definition of Cfi^o) to be equal to eo, which only provides an 
upper bound. Since f^^\0) > and /(•) is continuous, we conclude that for some constant c > 0, and 
sufficiently small x, f[x) > c x x*". Therefore, 

1/2 -Qf{x)= r fit)dt>^x^+\ (35) 
Jo + 1 



and thus. 



x>^f{l/2-^x^+'), (36) 
r+1 



for sufficiently small x. Note that we have used the fact that ^/(•) is a decreasing function. Equivalently, 
(36) means that 

*/(l/2-x) = 0(xi/("+^)) (37) 
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Figure 2: Theoretical lower bound on the correct support estimation of £i minimization, as a function of the weak threshold exceeding 
fraction eg. The plots are based on the theoretical results of Theorem |7.1| and are derived for Gaussian, uniform and two sided Rayleigh 
distributions. 



as x — 7- 0. On the other hand, note that 2{i+eo) — ^/^ ~ £0) and thus: 



^/(^^^)<^/(V2-eo). (38) 



It follows from the above, (37), and the fact that f{x) = 0{x^) as x — )• that 

(i-^o) 



*'^^""°'^-/(x)dx = 0(er/('^+^)), (39) 



as eo — )• 0. Furthermore, from Theorem 5.2 we know that C(ei) > — eo (note that ei = eo) 



and therefore ^ = ^i^/^o) as eo — )• 0. Also, Ej-j-.-jlXl > is constant. Therefore, from these 

conclusions and the definition of C/(")) it follows that Cfi^o) = ^'(eQ^*-''^^^), as eo — ?• 0. 



As a numerical example, we compute a theoretical bound for the approximate support recovery of 
ii minimization and threshold improvement in the case of 6 = 0.5555. It is easy to verify numerically 



that the conditions of Theorem 4.1 hold. The value of k* is no more than \/3 in this case. A theoretical 



bound on the overlap fraction between the /c-support set of x and the support set of the fc-sparse x for 



an arbitrary distribution is provided by Theorem 7.1, where k = {l + eQ)iJ,w{S)n. We have computed this 



bound for three different distributions: Gaussian, uniform (-1,1) and a two sided Rayleigh distribution. 
The value of r, namely the smallest nonzero derivative order is for Gaussian and uniform distributions, 
and is 1 for the Rayleigh distribution. The computed bounds are plotted in Figure [2j Furthermore, using 



a value of w = 10, and based on the premise of Theorem 4.1 and the computed bounds, we can certify an 
improvement of eo = 5 x 10"'^ in the weak recovery threshold in the case of Gaussian distribution. For the 
uniform and Rayleigh distributions, the theoretical predictions in the improvement of recovery thresholds 
are smaller than the case of Gaussian, but are still strictly positive. These improvement guarantees are 
of course much smaller than the practical values we would observe in practice, as will be illustrated in 
the following section. 
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Figure 3: Empirical Recovery Percentage for n = 200 and 5 = 0.5555. 

8 Simulations 

We demonstrate the validity of the theoretical results of the previous sections, and the performance of 
Algorithm [T] by a few numerical simulations. The purpose of the simulations of this section is both to 
evaluate the performance of the proposed reweighted ii algorithm in practice, and to verify its distribution 
dependent behavior. Figure [3] shows the empirical performance of Algorithm [T] for sparse signals with 
various distributions. Here the signal dimension is n = 200, and the number of measurements is m = 112, 
which corresponds to a value of 5 = 0.5555. We generated random sparse signals with i.i.d. entries coming 
from certain distributions, namely Gaussian, uniform, Rayleigh, square root of x-square with 4 degrees of 
freedom and, square root of x-square with 6 degrees of freedom. All of these distributions are continuous 
and have some finite-order non- vanishing derivative at the origin. In fact, in an increasing order of the 
mentioned distributions, the smallest order of nonzero derivative at the origin varies from to 3. In 
other words, the pdf of a Gaussian and a uniform (—1, 1) distribution is nonzero at 0. The pdf of the 
Rayleigh distribution is zero at the origin, but has a nonzero derivative. Finally, the pdf 's of square root 
of a x-square with 4 and 6 degrees of freedom have second and third nonzero derivatives at the origin, 
respectively. In Figure [3| solid lines represent the simulation results for ordinary ii minimization, and 
different colors indicate different distributions. Dashed lines are used to show the results for Algorithm 
[T] Notice that the more derivatives that vanish at the origin, the less significant improvement over £i 
minimization is observed, which is consistent with the analysis of Section [7} The Gaussian and uniform 
distributions are flat and nonzero at the origin and show an impressive more than 20% improvement in 
the weak threshold (from 45 to 55 in this case). 

In Figure |4j the overlap between the support set of a fc-sparse signal x and the fc-support set of the 
approximation x given by li minimization averaged over 400 random samples is plotted. Again, five 
different distributions were considered. It is apparent that overlap fraction is a decreasing function of k, 
and depends on the smoothness of the probability distribution at origin. 

We also report experimental results using regular £i and reweighted ii minimization recovery algo- 
rithms over real world data. We have chosen a pair of satellite images (Figure [s]) taken in two different 
years, 1989 (left) and 2000 (right), from the New Britain rainforest in Papua New Guinea. Images origi- 
nally belongs to Royal Society for the Protection of Birds and was taken from the Guardian archive, an 
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Figure 4: Empirical overlap between the support set of a fc-sparse vector and the fc-support set of the £\ optimum, for n = 200 and 
<5 = 0.5555. Nonzero coefficients of signal are drawn from five different distributions (displayed). The average is over 400 samples. 



article on deforestation. These images are generally recorded to evaluate environmental effects such as 
deforestation. The difference of images taken at different times is generally not very significant, and thus 
can be thought of as compressible. We have applied ii minimization to recover the difference image over 
the subframe (subset of the original images) identified by the red rectangles in Figure [sj In addition, 
we also implemented the reweighted £i minimization of Algorithm [T| with k = O.ln (n being the total 
number of frame pixels), which assumed no prior knowledge about the structural sparsity of the signal 
or the nonzero coefficients. This value of k was chosen heuristically, and is close to the actual support 
size of the signal. The original size of the image is 275 x 227. We reduced the resolution by roughly a 
factor of 0.05 for more tractability of ii solver in MATLAB. In addition, only the gray scale version of 
the difference image was taken into account, and was normalized so that the maximum intensity is 1. 
Furthermore, prior to compression, the difference image was further sparsified by rounding the intensi- 
ties less than 0.1 to zero. We pick the weight value cj = 2 for the weighting stage of the reweighted ii 
algorithms. The normalized recovery error is defined to be the sum square of the intensity differences 
in the recovered and the original image, divided by the sum square of the original image intensity, i.e. 
J2iehamc(^i — /j)^/ Xlieframe ' '^^^ average normalized error for ii minimization and reweighted ii min- 



imization is displayed in Figure 7a as a function of 6. The average is taken over 50 realizations of i.i.d. 
Gaussian measurement matrices for each 5. As can be seen, the recovery improvement is significant in 
the reweighted ii minimization. 

Another experiment was done on a pair of brain fMRI images taken at two different instances of time, 
shown in Figure [6] Similar to the satellite images, the objective is to recover the difference image from a 
set of compressed measurements The original image size is 271 x 271, and similar preprocessing steps as 
for the satellite images were done before compression. We used ii minimization and Algorithm [T] with 
no presumed prior information, with k = O.ln and u = 1.3. The average normalized recovery errors are 



displayed in Figure 7b, from which we can infer similar conclusions as in the case of satellite images. 
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Figure 5: Satellite images taken from the New Britain rainforest in Papua Guina at 1989 (left) and 2000 (right). Red boxes 
identify the subframe used for the experiment, and green boxes identify the regions with higher associated weight in the weighted £i 
recovery. Image originally belongs to Royal Society for the Protection of Birds and was taken from the Guardian archive, an article on 
deforestation http : / /www . guardian. co .uk/enviroiiment/2008/jan/09/endaiigeredspecies . endangeredhabitats 




Figure 6: Functional MRI images of the brain at two different instances illustrating the brain activity. Green boxes identify 
the region with higher associated weight in the weighted £i recovery. Image is adopted from https://sites.google.com/site/ 
|psychoph£Lrmacology2010/ student-wiki-f or-quiz-9[ 

9 Conclusion 

We introduced a new two-step reweighted ii minimization for the recovery of linearly compressed sparse 
signals. We proved that for sparse signals the nonzero entries of which are drawn from a broad class of 
continuous distributions, the proposed algorithm achieves a recovery threshold strictly better than that of 
li minimization. Our theoretical analysis predicts that the performance improvement strongly depends 
on the distribution of the nonzero entries, and should be better for distributions with a smaller non- 
vanishing order of derivative at the origin. This was very closely verified by our numerical simulations. 
For distributions with no finite order non-vanishing derivative at origin, our analysis does not predict 
any improvement in the performance. This is also the case in practice: For ternary signals with nonzero 
values equal to ±1 no improvement is observed in the empirical recovery threshold over the regular ii 
minimization. Our analysis was based on random Gaussian measurement matrices, and the robustness 
results of ii minimization. Possible related future research could address other measurement matrix 
ensembles, and the development of reweighted algorithms that can universally improve the recovery 
performance of linear programming. On the other hand, the improvement predictions using our theoretical 
tools are not tight, due to upper bounding techniques and worst case considerations in various parts of 
our proofs, specially in predicting the approximate support recovery potential of ii minimization. Future 
work can concentrate on tightening these bounds through more clever techniques, and consequently 
achieving more promising performance guarantees for reweighted linear programming. 
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Figure 7: Average normalized recovery error for £i, and reweighted £i minimization recovery of the difference between the subframes 
of (a) a pair of satellite images shown in Figure [s] and (b) the pair of brain fMRI images shown in Figure |6] Data is averaged over 
different realizations of measurement matrices for each 5. 
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A Proof of Lemma 5.2 



Let a = ^'(^). We consider random variables Xi = \Xi\ ■ 1 {\Xi\ > a) for each 1 < i < A'", where 
1 (|Xi| > a) is equal to 1 if | Xjl > a, and is otherwise. Also, let S = Xi + X2 + ■ ■ • + X^- We first note 
that the empirical average of the XiS converge to its expectation. More formally, an application of the 
Bernstein concentration inequality (see e.g., |15]) implies that for every e' > and for some ci > 0, the 
following holds: 

P (\S/N - ¥.{S/N)\ > e') < exp(-ci7Ve'). (40) 
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On the other hand: 



K{S/N) = EXi = F{\Xi\ >a) = y^e''^. 



(41) 



Similarly, for the random variable = Xi + X2 + • • • + X^^f, we can write the following concentration 
inequality using Chernoff bound for some C2 > 0: 



{\Sn/N - E{Sn/N)\ > e') < exp{-C2Ne'). 



(42) 



Since E{Sn/N) = ^2/^, this establishes ([16|. 

Let the random variable M' be the number of nonzero Xi's. First of all, note that S = S^'- The 
rest of the proof includes the following steps. We prove that Sm'/Sn is concentrated around KSm' /^Sjy 
with high probability. Then we use the fact that M' also converges to its expected values, M, to show 
that Sm/Sn becomes arbitrarily close to Sm'/Sn- As a result, Sm/Sn will be concentrated around 
KSm' /^Sn with high probability, which is the desired result. 



Concentration of Sm'/Sn/ is shown by using equations (41) and (42) simultaneously. Combining the 
two inequalities, we conclude that 



S 



M' 



N 



ire 2 



< e' and 



Sn 



< e' > 1 



-ciNe' 



-C2Ne' 



(43) 



and thus, 



Vv 



ire 2 



Y^27^ + e' Sn .JYpn - e' 



and consequently: 



(44) 



Sm' -sl 

e 2 



Sn 



2y2Ar(e-^ + l)e' \ _ 
2/7r - e'2 ' - ^ e 



(45) 



If e' is sufficiently small, then '^^'^^yt-e'^'^^^'^ — ^"^^ some constant q > 0. Taking e" = ae' , a\ = ci/a 
and Q2 = 02/0, we can say that for sufficiently small e" the following holds: 



Sm' -el 
e 2 



Sn 



< e" > 1 - e 



-aiNe" _ ^-a2Ne" 



(46) 



Now we show that the quantity '^'sj^^' ^^^^ t)e arbitrarily small for large N. To do so, assume without 
loss of generality that |Xi| > IX2I > • • • > \Xn\, and that Mi = min(M,M'), and M2 = max{M,M'). 
We then have: 

\Sm — Sm'\ = l-'^Mi+il + 1-^^^/1+2! H 1- |-^M2l) (47) 
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and 



\Sn\ = \Xi\ + \X2\ + --- + \Xn\ 

> \Xi\ + \X2\ + ---\XmA 

> {N-Mi)\Xm,\ 

> — ^ I — Smi I 



> 



M2 - Ml 
N -M 



\M -M 



j7\Sm - S 



M'\- 



(48) 
(49) 



Note that equation (48) holds because I^mJ is larger than all the values . . . , jXMal, and is 

directly fi 

\M' - M\ 



therefore larger than 1/(M2 — Mi) times their sum. It directly follows from (49) that: 

\Sm — Sm' 



Sn 



< 



N- M 



(50) 



Therefore, to show the concentration of the left hand side in the above inequality, it suffices to show that 
concentrates. Since the variables X- = 1 (|Xj| > a) are independent Bernoulli random variables 
^ of being nonzero, a Chernoff concentration bound on their empirical average 



N-M 

with probability 2Q(a 
implies that 



N 



■EX' 



(51) 



for some C3 > 0, and for every e'" > 0, where X' has the same distribution as all X'-^s. Noting that 
^^^^ X'i = M' and EX' = M/N, the above implies that: 



\M -M'\ 
N 



< e'") 



\M - M'\ 



< 



N-M - 1 - M/N 



(52) 



If the ratio M/N is kept constant, the quantity iJj^jj^^ will be smaller than any e > as e'" becomes 



arbitrarily small, which shows the concentration of ^ . Using this and the inequality of (|50|) we can 



conclude that 



l-5'j\/--5'jt//| 



< e with probability 1 — e °3<:^ gome constant > 0. Combining this latter 



conclusion with (46), it follows that 



Sm 
Sn 



e 2 



<e+e >1 — e 



-aiAfe" 



(53) 



Consequently, we conclude that if e is sufficiently small, the following holds: 



Sm 

Sn 



e 2 



< e > 1 - 3e 



-cNe 



(54) 



for some c > 0, which concludes the proof of (17). 
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B Proof of Lemma 



5.3 



Let /3 = 1 — 2(5(-y/— 2 log(l — a)), and without loss of generality assume that the k nonzero values of x 
are xi, X2, ■ ■ ■ , Xk, with |xi| < 1x21 < • • • < Xk- In order to show that W^(x, a||x||i) < k[j3 + e), it suffices 
to show that X^^i''^^'^'* \xi\ > a||x||i. Applying the order statistic result of Lemma 
high probability: 



5.2 



we have that with 



%t I '7' - 1 - exp(-^4^) > 1 - exp(-^i-^) = /, (55) 
which concludes the proof. 

C Computation of Ac Threshold 

In [l7j, a "sectional" threshold 6c (71, 72, /i, /2^a;) is defined, with the following implication. Let L be 

(T) 

an index set of size 7in. If 5 > (5c (71, 72, fi, f^,^), then a sparse vector x with a random sign pattern 
with exactly 71 /in nonzero entries over L and exactly ^2 fin entries over L can be recovered using the 
following weighted ^1 minimization: 

min ||zj;,||i + tj||z2;||i subject to Az = Ax (56) 

(T) 

The reason 5c is called sectional is that it provides a recovery guarantee for all support set x satisfying 
the nonuniform sparsity pattern, but almost all support sets. From this definition, it immediately follows 



that the Ac of Theorem 6.1 is given by: 



Ac= , max 5P{^,l-^J[,f^M- (57) 



Furthermore, The explicit derivation of 5j is given in [17] which is as follows: 



5^^^ =mm{5 \ 1pcom{Tl,T2) - 1pint{Tl,T2) - 1pext{n,T2) < 

V < n < 7i(i - /i),o < T2 < 72(1 - /2), 

Tl + T2> 5 - 71/1 - 72/2} 

where ipcom, i'int and V'ext are obtained as follows. Define g{x) = ;^e~^^, G{x) = e~y^ dy and let 

(/?(.) and <!>(.) be the standard Gaussian pdf and cdf functions respectively. 

ipcom{ri,T2) = (ri + r2 + 71(1 - fi)H' ' 



7i(l-/i)- 



+ 72(1 - f2)Hi ) + 7ii^(/i) + 72i^(/2)) log 2 (58) 

72(1 - /2j 
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where H{-) is the Shannon entropy function. Define c = (ri +71/1) +a;^(r2 + 72/2), oi = 7i(l — /i) — ti 

9{x)ai _ ujg{LUx)a2 
xG{x) xG{u)x) 



and a2 = 72(1 — f2) — ^2- Let xq be the unique solution to x of the equation 2c — ^^^tt ~ _ q_ 



Then 

„2 



V'ext(n,T2) = crEg - ailogG(xo) - a2\ogG{ujXQ) (59) 
Let h = ^3^, 17' = 71/1 + ^2^2/2 and Q{s) = (^^^^^ + ^Z'StsY ^^^^^^^ ^he function M{s 



— and solve for s in M{s) = . Let the unique solution be s* and set y = s*{h — -^f^^)- 

Compute the rate function A*(y) = sy — ^^^^^ Ai(s) — Ai(a;s) at the point s = s*, where Ai(s) = 

2 

^ + log(2$(s)). The internal angle exponent is then given by: 

V'm<(n,r2) = (A*(y) + ^log2)(Ti +r2) (60) 

When /i — 1 and /2 — )■ 0, the terms Ac(7i, 72, /i, /2,a;) and 6c (71, 72, /i, /2^(^) become arbitrarily 
close, and converge to (^c(7i) 72) !> 0, w), which is defined as the weak threshold of weighted li minimization 
for the weighted ii minimization for the nonuniform sparsity model with set fractions 71,72 and sparsity 
fractions 1 and 0(see pTj). 



D Proof of Theorem 5.2 



The proof of this theorem is common to the most part with the technical details of |14j , which are based 
on Grassman manifold techniques for the performance analysis of compressed sensing. The method is 
basically the extension of the high dimensional techniques of Donoho et al. [U [TU] for incorporating noise 
into the performance bounds of li minimization. First consider the following lemma. 

Lemma D.l. Let A he a general m x n measurement matrix, x be an n-element vector and y = 
Ax. Denote K as a subset of {1,2, ... ,n} such that its cardinality \K\ = k and further denote K = 
{1, 2, . . . ,n}\K . Let w denote an n x 1 vector. Let C > 1 be a fixed number. 

Given a specific set K and suppose that the part of x on K , namely xk is fixed. Vx;^, any solution 
X produced by the £1 minimization satisfies 

2 „ „ 
llxi^lli - ||x/^||i < ^ _ ^ llXj^lll 

and 

2C 

||(x-x);^||l < -^T—^W^kWi^ 

if and only if\fw G such that Aw = 0, we have 

||xi^ + wxlli + ||-^||i > ||xi^||i. (61) 



In fact, if (61) is satisfied, we will have the stability result 
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In [T3j, it was established that when the matrix A is sampled from an i.i.d. Gaussian ensemble, C = 1, 
considering a single index set K, there exists a constant ratio < fiw < 1 such that if ^ < fJ,w, then with 



overwhelming probability as n — )• oo, the condition (61 ) holds for all w G M" satisfying Aw = 0. Now if we 



take a single index set K with cardinality ^ = (1 — ei)/ipi/, we would like to derive a characterization of 
C, as a function of — = (1 — ei)fJ-Wi such that the condition (61 ) holds for all w G M" satisfying Aw = 0. 



When the measurement matrix A is sampled from an i.i.d. Gaussian ensemble, it is known that the 



probability that the condition (61 ) holds for all w G M" satisfying Aw = is the Grassmann angle, namely 
the probability that an (n — m)-dimensional uniformly distributed subspace intersects a polyhedral cone 
trivially (intersecting only at the apex of the cone). The complementary probability that the condition 



(61) does not hold for all w G M" satisfying Aw = is the complementary Grassmann angle. In our 
problem, without loss of generality, we scale (extended to an n-dimensional vector supported on K) 
to a point in the relative interior of a (A: — l)-dimensional face F of the weighted ii ball, 

SP = {yGM" I ||yi^||i + ||^||i<l}. (62) 

The polyhedral cone we are interested in for the complementary Grassmann angle is the cone SP — xk , 
namely the cone obtained by setting xk as the apex, and observing SP from this apex. 

Building on the works by Santalo [20] and McMullen |21j in high dimensional integral geometry 
and convex polytopes, the complementary Grassmann angle for the {k — l)-dimensional face F can be 
explicitly expressed as the sum of products of internal angles and external angles |22j : 

^ = 2xE E /3(i^,G')7(G,SP), (63) 

S>0 G&m + l + 2s{SP) 

where s is any nonnegative integer, G is any (m + 1 + 2s)-dimensional face of the SP (9m+i+2s(SP) is 
the set of all such faces), /3(-, •) stands for the internal angle and 7(-, •) stands for the external angle. 
The internal angles and external angles are basically defined as follows |22| [21]: 

• An internal angle (3{Fi,F2) is the fraction of the hypersphere S covered by the cone obtained by 
observing the face F2 from the face Fi. The internal angle /3(Fi,i<2) is defined to be zero when 
Fi ^ F2 and is defined to be one if Fi = F2. 

• An external angle 7(i^3,-p4) is the fraction of the hypersphere S covered by the cone of outward 
normals to the hyperplanes supporting the face -F4 at the face -F3. The external angle 7(^3, -F4) is 
defined to be zero when F^ ^ F4 and is defined to be one if F3 = F4. 



^Note the dimension of the hypersphere S here matches the dimension of the corresponding cone discussed. Also, the 
center of the hypersphere is the apex of the corresponding cone. AU these defaults also apply to the definition of the external 
angles. 
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When C = 1, we denote the probabihty P in (63) as Pi. By definition, the weak threshold fiw is the 



supremum of — < fiw such that the probabihty Pi in (63) goes to as n — )• oo. We need to show for 



^ = (1 — ei)^w and C = :^7f==) (63) also goes to as n — )• oo. To that end, we only need to show the 



probability P' that, there exists an w from the null space of A such that 

||xi^ + wkIIi + llTf^lli + ll^lli < llxi^lli (64) 

goes to as n — )• oo, where Coo is a large number which we may take as oo at the end, i^i, K2 and K 
are disjoint sets such that jiTi |J = ^u^yn and Ki |J K2 = K. 

Then the probability P' will be equal to the probability that an (n — m)-dimensional uniformly 
distributed subspace intersects the polyhedral cone WSP — li-K nontrivially (intersecting at some other 
points besides the apex of the cone), where WSP is the polytope 

WSP = {y € M" I llyA'lli + ll^lli + ll^lli < !}• (65) 
Then P' is also a complementary Grassmann angle, which can be expressed by |22j : 

^' = 2xE E /3(P,G)7(G,WSP). (66) 

s>0GeSm+i+2s(wsp) 

Now we only need to show P' < Pi. If we denote I = (m + 1 + 2s) + 1 and A; = (1 — eij^yyn, in 
the polytope WSP, then there are in total {^Zk)"^^'^ faces G of dimension (/ — 1) such that FOG and 
/3(F,G)/0. 

However, we argue that when Coo is very large, only ("^j^^^)2'^'^ such faces G of dimension (l — 1) will 



contribute nonzero terms to P' in (66), where ki = fiwn. In fact, a certain (/ — l)-dimensional face G 
supported on the index set L is the convex hull of CjCj, where i £ L, d is the corresponding weighting 
for index i (which is 1 for the set K, Coo for the set Ki and C for the set K2 ), and is the standard 



unit coordinate vector. Now we show that if Ki ^ L, the corresponding term in (66) for the face G will 
be when Coo is very large. 

Lemma D.2. Suppose that F is a {k — 1)- dimensional face of WSP supported on the subset K with 
\K\ = k. Then the external angle 7(G, WSP) between an {I — 1)- dimensional face G supported on the set 
L(F <^ G) and the polytope WSP is when Ki ^ L and Coo is large. 

Proof. Without loss of generality, assume K = {n — k + 1, ■ ■ ■ , n}. Consider the {I — l)-dimensional face 

Cr — conv{C/„_i+i X e ,...,Un-fcXe ,e ,---,e j 
of WSP. The 2"~^ outward normal vectors of the supporting hyperplanes of the facets containing G are 
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given by 

n—l n—k n 

{^jpep/Cp+ ep/Cp+ J2 ep, jp £ {-1,1}}. 

p=l p=n—l+l p=n—k+l 

Then the outward normal cone c(G, WSP) at the face G is the positive hull of these normal vectors. 
When Ki ^ L, the fraction of the surface of the {n — I — l)-dimensional sphere taken by the cone 
c(G, WSP) is since the corresponding Cp is very large. ■ 

Now let us look at the internal angle G) between the {k — l)-dimensional face F and an (/ — 1)- 
dimensional face G, where Ki is a subset of the support set of G. Notice that the only interesting case 
is when F (1 G since f3{F, G) only if F C G. We will see if F C G, the cone c{F, G) formed by 
observing G from F is the direct sum of a (A; — l)-dimensional linear subspace and the positive hull of 
{I — k) vectors. These (/ — k) vectors are in the form 

= (-p...,-pO,...,Gi,0,...0),iGL\K. 

For those vectors Vi with i £ Ki, Gi = G^o- When Coo is very large, the considered cone takes half of the 
space at each i-th coordinate with i G Ki. 

So by the definition of the internal angle, the internal angle P{F,G) is equal to ^kl-k ^ f^i^jGi), 
where Gi is supported only on the set L\Ki. It is known that this internal angle f3{F, Gi) is equal to the 
fraction of an (I — ki — l)-dimensional sphere taken by a polyhedral cone formed by {I — ki) unit vectors 
with inner product between each other. In this case, the internal angle is given by 

mG)-:^ , (67) 

where Vi{S^) denotes the i-th dimensional surface measure on the unit sphere S*, while Vi{a',i) denotes 
the surface measure for regular spherical simplex with (z + 1) vertices on the unit sphere and with 
inner product as a' between these {i + 1) vertices. Thus (67) is equal to B{ ^j^-i^, , I — ki), where 



B{a',m') = e'^y^im' - l)a' + In-""' ^^a''^^^ Jim' , 9), (68) 
with 9 = (1 — a') /a' and 



oo roo 



oo Jo 



J{m',9) = ^ (/ e-^" +2™^ d?;)'" e-^ dA. (69) 



If we take G = , then 



1 + C2A; 1 + ki' 

By comparison, (3{F,G) = ^Fpr x P{F,G) is exactly the ^fcprfc/3(Fi, Gi) term appearing in the ex- 
pression for the Grassmann angle P between the face Fi supported on the set Ki and the polytope SP, 
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where Gi is an (/ — l)-dimensional face of SP supported on the set L. 

Similar to the derivation for the internal angle, we can show that the external angle 7(G,WSP) is 
also exactly equal to 7(Gi,SP) term appearing in the expression for the Grassmann angle P between 
the face Fi supported on the set Ki and the polytope SP, where Gi an {I — l)-dimensional face of SP 
supported on the set L. 

Since there are in total only ("J^^^)2'^*'' such faces G of dimension {I — 1) will contribute nonzero 

terms to P' in (66), substituting the results for the internal and external angles, we have P = P'. Thus 

1 



for ^ = (1 — ei)fiw and C = , with high probability, the condition the condition (61 ) holds for all 



w G satisfying Aw = 0. 
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